Goto

Collaborating Authors

 control system


Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

Neural Information Processing Systems

This paper develops a Pontryagin differentiable programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin's Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system's trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multilink robot arm, 6-DoF maneuvering UAV, and 6-DoF rocket powered landing.


Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies

Mahato, Saahil

arXiv.org Artificial Intelligence

Urban traffic congestion, particularly at intersections, significantly affects travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to effectively manage dynamic traffic patterns. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. A simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise to improve urban traffic management efficiency. More research is recommended to address the challenges of scalability and real-world implementation.


BUDD-e: an autonomous robotic guide for visually impaired users

Li, Jinyang, Farina, Marcello, Mozzarelli, Luca, Cattaneo, Luca, Rattamasanaprapai, Panita, Tagarelli, Eleonora A., Corno, Matteo, Perego, Paolo, Andreoni, Giuseppe, Lettieri, Emanuele

arXiv.org Artificial Intelligence

Abstract--This paper describes the design and the realization of a prototype of the novel guide robot BUDD-e for visually impaired users. The robot has been tested in a real scenario with the help of visually disabled volunteers at ASST Grande Ospedale Metropolitano Niguarda, in Milan. The results of the experimental campaign are throughly described in the paper, displaying its remarkable performance and user-acceptance. Index T erms--Assistive technologies, autonomous navigation, autonomous robotics, autonomous guide for visually impaired users. According to [1], in 2020 the number of totally blind people was estimated to about 49.1 million (about 0.6 % of the world population), while people with severe and moderate vision problems were estimated to 33.6 million (about 0.4 % of the world population) and 221.4 million (about 2.8 % of the world population), respectively. Furthermore, due to an aging population, it is estimated that the rate of people affected by vision problems will continue to increase in the coming decades [2]. People with visual impairments currently face a number of issues when it comes to visiting public spaces and using services. It is very difficult for blind and partially sighted persons to access shared places (areas where cars, buses, pedestrians, and cyclists share the same space) alone since important inclusive environmental aids are frequently removed in communal areas. As discussed in [3], navigating inside a shopping mall for a blind or low-vision person can be tiring and stressful. Shopping in groceries is practically impossible and shopping centers often don't have enough staff on duty to offer help. Emanuele Lettieri is with the Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4, Milan, Italy (e-mail: emanuele.lettieri@polimi.it).


Heuristic Step Planning for Learning Dynamic Bipedal Locomotion: A Comparative Study of Model-Based and Model-Free Approaches

Suliman, William, Chaikovskaia, Ekaterina, Davydenko, Egor, Gorbachev, Roman

arXiv.org Artificial Intelligence

This work presents an extended framework for learning-based bipedal locomotion that incorporates a heuristic step-planning strategy guided by desired torso velocity tracking. The framework enables precise interaction between a humanoid robot and its environment, supporting tasks such as crossing gaps and accurately approaching target objects. Unlike approaches based on full or simplified dynamics, the proposed method avoids complex step planners and analytical models. Step planning is primarily driven by heuristic commands, while a Raibert-type controller modulates the foot placement length based on the error between desired and actual torso velocity. We compare our method with a model-based step-planning approach -- the Linear Inverted Pendulum Model (LIPM) controller. Experimental results demonstrate that our approach attains comparable or superior accuracy in maintaining target velocity (up to 80%), significantly greater robustness on uneven terrain (over 50% improvement), and improved energy efficiency. These results suggest that incorporating complex analytical, model-based components into the training architecture may be unnecessary for achieving stable and robust bipedal walking, even in unstructured environments.


BRAVE: Brain-Controlled Prosthetic Arm with Voice Integration and Embodied Learning for Enhanced Mobility

Basit, Abdul, Nawaz, Maha, Shafique, Muhammad

arXiv.org Artificial Intelligence

Non-invasive brain-computer interfaces (BCIs) have the potential to enable intuitive control of prosthetic limbs for individuals with upper limb amputations. However, existing EEG-based control systems face challenges related to signal noise, classification accuracy, and real-time adaptability. In this work, we present BRAVE, a hybrid EEG and voice-controlled prosthetic system that integrates ensemble learning-based EEG classification with a human-in-the-loop (HITL) correction framework for enhanced responsiveness. Unlike traditional electromyography (EMG)-based prosthetic control, BRAVE aims to interpret EEG-driven motor intent, enabling movement control without reliance on residual muscle activity. To improve classification robustness, BRAVE combines LSTM, CNN, and Random Forest models in an ensemble framework, achieving a classification accuracy of 96% across test subjects. EEG signals are preprocessed using a bandpass filter (0.5-45 Hz), Independent Component Analysis (ICA) for artifact removal, and Common Spatial Pattern (CSP) feature extraction to minimize contamination from electromyographic (EMG) and electrooculographic (EOG) signals. Additionally, BRAVE incorporates automatic speech recognition (ASR) to facilitate intuitive mode switching between different degrees of freedom (DOF) in the prosthetic arm. The system operates in real time, with a response latency of 150 ms, leveraging Lab Streaming Layer (LSL) networking for synchronized data acquisition. The system is evaluated on an in-house fabricated prosthetic arm and on multiple participants highlighting the generalizability across users. The system is optimized for low-power embedded deployment, ensuring practical real-world application beyond high-performance computing environments. Our results indicate that BRAVE offers a promising step towards robust, real-time, non-invasive prosthetic control.


Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault School of Computing Science University of Glasgow

Neural Information Processing Systems

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks.




Supplementary materials for the Pontryagin Differentiable Programming paper A Proof of Lemma 5.1

Neural Information Processing Systems

To prove Lemma 5.1, we just need to show that the Pontryagin's Maximum Principle for the auxiliary X, (S.3) and the following matrix trace properties: Tr(A) = Tr( A Since the above obtained PMP equations (S.2) are the same with the differential PMP in (13), we thus Based on Lemma 5.1 and its proof, we known that the PMP of the auxiliary control system, (S.2), is exactly the differential PMP equations (13). From (S.2c), we solve for U Proof by induction: (S.2d) shows that (S.8) holds for This completes the proof. 2 D Algorithms Details for Different Learning Modes SysID Mode, then use the learned dynamics as the initial guess in IRL/IOC Mode. In design of the quadrotor's control objective function, to achieve SE (3) maneuvering In Fig. S1, we show more detailed results of imitation loss versus iteration In Fig. S2, we show more detailed results of SysID loss versus iteration In Fig. S5, we use the On the cart-pole and robot-arm systems (in Figure 1a and Figure 1b), we learn a feedback policy by minimizing given control objective functions. In Fig. S3, we show the detailed results of control loss (i.e. the value of control objective S6, we have the following remarks. This can be seen in Fig. S3 and Fig. S6 (in Fig. S6, PDP results in a simulated trajectory which is closer to the optimal one than that This explains why PDP outperforms GPS in terms of having lower control cost (loss).